perm filename CRITIQ[4,KMC] blob sn#156971 filedate 1975-04-18 generic text, type T, neo UTF8
\\M0FIX25X;\.






\F0\CTEN CRITICISMS OF PARRY

\CKenneth Mark Colby


	Much of the Artificial Intelligence community is now aware of
a  computer  simulation  of paranoid processes developed by the Colby
group at Stanford.  The model (called PARRY) has been  available  for
interviewing  on  the  ARPA  network and thousands of interviews have
been conducted with several versions of the model.  During  the  long
period  of  development  of  the  model,  we  have  been aware of the
limitations  of  various  alternative   programming   approaches   to
designing  an  algorithm  capable  of  conducting  useful non-trivial
dialogue in natural language. Colleagues,  associates,  and  students
have   volunteered  a  number  of  criticisms  of  the  model.  Since
criticisms can be endless, I shall restrict the  discussion  to  only
those which we consider serious, reasoned, and well-founded.
	Workers in A-I come from different  intellectual  traditions.
One's  intellectual background influences one's image of what a model
is and how it should function. Those from  mathematical  and  logical
backgrounds  like  to  see  lots  of  deductive inference; those from
physics and chemistry like to see laws; those from the life  sciences
like  to  see  complexity,  growth,  and development represented in a
model.     It  is  important  that  we  recognize  and  respect   the
traditions  and  philosophies of both the demonstrative and empirical
sciences.  Those raised on a Euclidean model  of  knowledge  seek  to
understand  phenomena using a few definitions and axioms, a few rules
of inference, long chains of inference,  and  deductive  consistency.
Some   aspects  of  experience  yield  to  this  approach  but  many,
especially in the case of living organisms, do not.
	Everyone  realizes  that  a model represents a simplification
and an idealization.  In constructing a model, only a  few  variables
are  selected  as  centrally relevant while the rest are neglected as
secondary or unknown.    Only a few  relations  between  the  relevant
variables  are  introduced.  Thus a model does not match exactly that
which it models in all details.  It is  partial  in  that  only  some
aspects  of  the  referent  system  are  represented  and  it  is  an
approximation in that it is limited in depth and not free of error. A
model  is  an idealization in that it may utilize abstractions and it
may possess perfect properties known to be  lacking  in  its  natural
counterpart.  Hence the model's knowledge is not as extensive as that
of a person  and  it  possesses  a  perfect  memory  unencumbered  by
inhibitory  processes.  We  can  allow ourselves this idealization of
perfect memory because we  are  not  studying,  for  example,  memory
decay,  since  we  do  not  consider it to be a pertinent variable in
paranoid processes.
	These points are discussed in greater detail in a forthcoming
monograph (Colby,1974).
	I  shall  list  ten  major criticisms of the model which have
come to our attention and attempt replies to each.

CRITICISM #1:
	PARRY  is  simply  a  stimulus-response model.  It recognizes
something  in  the  input  and  then  just  responds  to  it  without
"thinking"  or  inferring.    The model should interpret what it sees
and engage in more computation than execution of a simple rewrite  or
production rule.

REPLY:
	It is true that in early versions of the model  many  of  the
responses  consisted  of  simple  rewrites,  e.g.      when the input
consisted of "Hello", the output response was "Hi" and no rules other
than  of  the  type  "see  x, say y" were involved.(Colby, Weber, and
Hilf, 1971). But as we began to improve and extend  the  model,  this
type  of  response  disappeared. PARRY no longer consists of a single
program; rather it is a system of programs. In the  current  version,
the  model  consists  of  two modules, one for recognition of natural
language expressions  and  one  for  response.  Once  the  recognizer
decides  what  is  being said, the response module, using a number of
tests and rules, decides how to respond.     The output action of the
model  is  now a function of input, beliefs, affects, and intentions.
Thus a "Hello" no longer receives an automatic "Hi" but may receive a
variety  of  responses  depending  on  a  large number of conditions,
including a "model" of the interviewer which PARRY builds  up  during
the  interview.    This  representation  of  the interviewer involves
making inferences about his competence, his helpfulness, etc.

CRITICISM #2:
	PARRY'S language recognition processes do not analyze natural
language input sufficiently. They only try to match patterns and thus
they are naive and simplistic linguistically.

REPLY:
	PARRY  does  not utilize a grammar in processing its input of
everyday conversational English. Whereas grammar-based parsers may be
sophisticated   linguistically,  they  are  too  fragile  to  operate
satisfactorily in real-time interviews allowing unrestricted English.
PARRY'S language-recognition module uses pattern-matching rules which
attempt  to   characterize   input   expressions   by   progressively
transforming  them  into patterns which match, completely or fuzzily,
abstract stored patterns. The power of  this  approach  lies  in  its
ability  to  ignore recognized and unrecognized words and still grasp
the meaning of the message. (Colby, Parkison, and Faught, 1974).
	Our problem was not to develop or apply a  linguistic  theory
nor  to  assert  hypotheses  about  how people process language.  Our
problem was to design a working algorithm which  recognizes  what  is
being  said in a dialogue in order to make a linguistic response such
that a sample of I-O pairs from the paranoid model is judged  similar
to   a   sample   of   I-O  pairs  from  paranoid  patients.  Seeking
effectiveness  in  real-time  with  unrestricted  input,  we  took  a
straightforward A-I approach to the problem. This approach has proved
to be adequate for our purposes.

CRITICISM #3:
	PARRY's  performance  constitutes an illusion.    The model's
data-base knowledge is too limited to represent adequately all that a
person  knows.    Because  the model can answer a few questions well,
people (having  many  tacit  expectations  and  presuppositions)  are
easily  fooled into believing PARRY is capable of answering the great
variety of questions a person is capable of  answering.  People  will
assume  there  is  much more there than there really is.   Thus PARRY
represents a mirage, a conjurer's trick in which the audience is  led
to believe something is true when it is not.

REPLY:
	One  of Descartes' tests for distinguishing man from machines
was that the latter "did not act from knowledge  but  only  from  the
disposition  of  their  organs".   (Descartes'  other  test concerned
linguistic variety). Granted that a model of a psychological  process
should  contain  knowledge,  the questions become, how much knowledge
and how is it to be represented?
	Since   a   model   is  a  simplification,  it  has  boundary
conditions.    A model of a paranoid patient  is  a  model  of  being
paranoid,  being a patient, and being a person. PARRY does reasonably
well in the first two of these "beings".    It  fails  in  the  third
because of limited knowledge. How can we decide what the model should
know?  It is theoretically trivial to add tomes of facts to the  data
base, but  this seems to be what some A-I critics want. The fact that
PARRY can discuss some topics rather well indicates it is  doing  the
right things in these domains and could do well in other domains that
are functionally similar. Simply adding facts without  improving  the
algorithm can lead to a degradation of performance as experience with
belief-system simulations and theorem-proving programs has shown.
	More important than sheer number of facts  is  how  they  are
organized,  how they are represented, and how they are handled by the
processing rules to contribute to the characteristic  performance  of
the  model.   Some seem happy to know there are fixed propositions or
"frames" in  the  data-base  which  can  be  consulted  in  answering
questions.  Even  if  a  model  can answer 50 questions about a topic
using rewrite rules, some would say the model does not really  "know"
anything about the topic. The procedural- declarative argument has no
end in sight.  It  seems  to  be  a  matter  of  personal  style  and
efficiency.
	PARRY  is  not a literal copy of a total person.  The test of
adequacy here is not Turing's machine-question-"which is  person  and
which  is  machine?" This is not a stringent test, since the criteria
for distinguishing what is human behavior over a  teletype  have  not
been systematically  worked out, i.e., almost anything is accepted as
being human.(Colby, Hilf, Weber, and Kraemer,1972). PARRY is not  the
real  thing;  it  is a model, a simulation, an imitation, a mind-like
artifact, an automaton, synthetic and artificial.     The real thing,
a  living  person, is characterized by such great logical complexity,
inhomogeneity  of  class,  and  individuality  that  a  strategy   of
simplification is called for.

CRITICISM #4:
	PARRY  models  paranoid  behavior   without   modelling   the
underlying mechanisms of paranoid processes. Because the I-O behavior
of PARRY is indistinguishable  from  the  I-O  behavior  of  paranoid
patients, it does not mean that the same mechanisms are involved.

REPLY:
	This is so true as to be an A-I truism.      When  the  inner
mechanisms of a system are inaccessible to observation, one must make
plausible guesses as to what is going on.   These  guesses  represent
analogies.   They are not to be taken as the "same" mechanisms. If we
knew the "real"  mechanisms,   there  would   be  no  need  to  posit
analogies about a hidden reality. We try to design structures to fill
in more and more of the black box.     Further  empirical  tests  and
experiments  are necesary to increase the plausibility of the analogy
proposed. Successful predictions and  pragmatic  usefulness  increase
the  acceptability  of  the model to the relevant expert community or
communities.
	We  can  never know with certainty whether a model is "true".
If it is consistent with itself and with  the  data  of  observation,
then  it is valuable cognitively and pragmatically. Such coherence is
not a definition of truth but a criterion for truth.
	An expert community has various criteria for acceptability of
a   model.   Sometimes  it  is  demanded  that  a  model  provide  an
explanation.  What  constitutes  an  explanation   may   range   from
describing  causes  to  making  intelligible  the connections between
input and output. An extreme view is that science  does  not  explain
anything;  A is simply interpreted in terms of B and B in terms of C,
etc.
	A  pragmatic criterion for a model is whether it represents a
workable possibility.    Can  it  be tested and measureably  improved
as a result of these tests? That is, is there an evaluation procedure
for  cumulative progress?  In the case of PARRY, the answer to  these
questions is "yes". (Colby and Hilf, 1974).

CRITICISM #5:

	PARRY is an ad hoc model. It is designed after  the  fact  to
fit a limited set of special cases and lacks generality.

REPLY:
	Sometimes    this    criticism    is    levelled    at    the
language-recognition processes and sometimes  at  the  scope  of  the
model.  The  language  recognizer  of PARRY is a pattern-matcher. The
surface English input expressions are transformed into more  abstract
patterns  which are matched against stored patterns.  The many-to-one
tranformation involves synonymic-translations and word-classes.  Thus
the  language-recognizer  has some generality in that these processes
can be used by any "host" system which takes natural language input.
	It is true that PARRY is circumscribed.    It "explains"  the
data  it  was designed to explain. One wants to achieve at least this
degree of explanatory power in a model. But can it predict a new fact
or  fit  a  new fact discovered in some other way?  This view sees ad
hocness,  not as a property of a model, but as a relation between two
consecutive  models  or  theories.      Does  PARRY  have  some novel
consequence compared to  its  predecessor?     One  trouble  is  that
predecessor  formulations  explaining  paranoia  have been so vaguely
stated as to be untestable.  The theory embodied  in  the  model  has
novel consequences compared to other formulations.
	For example, the theory posits  that  the  paranoid  mode  of
thought   involves  symbol-processing  strategies  which  attempt  to
forestall or minimize the affect of humiliation.  A novel consequence
of  this theory is that if a person were desensitized to the negative
affect of  humiliation,  he  would  be  less  prone  to  utilize  the
strategies of the paranoid mode.

CRITICISM #6:
	PARRY'S paranoid behavior is strictly the  result  of  canned
paranoid-like  responses. Granted that PARRY is diagnosed as paranoid
by expert judges, this diagnosis is not a consequence of  the  theory
embodied  in  the  model but is simply produced by the model's canned
replies which are linguistically paranoid in nature.

REPLY:
	
	This is a weighty  criticism  because  it  implies  that  the
theory  of humiliation and the rules of the model are excess baggage.
The  made-up  output  replies  are  so  typical  of  paranoid  verbal
responses  that  they  alone might be sufficient to simulate paranoid
interactions.
	Given that a model had a list of paranoid-like responses,  it
would still need some mechanism or rules for selecting which response
to output in reply to a specific input. Experiments have  shown  that
random selection from this list results in an inadequate performance.
For example, on a dimension of "thought disorder" on a 0-9 scale,  (0
means  zero  amount  and  9  means  a  large  amount), a random model
received a mean rating of 5.94 from  expert  psychiatrists.  Patients
rated  by  the  same  judges received a mean rating of 2.99 whereas a
version of PARRY was rated at 3.78. (Colby and Hilf, 1974).
	Little is known about how to generate surface  English  which
is appropriate to the input  and phrased in a  characteristic  style.
Segment-by-segment generation or even word-by-word  generation  would
be  preferable  to  outputting  canned sentences as long as the rules
posited for the paranoid mode were somehow called into  play  in  the
generation  process.  (Fortunately  no  one  has  demanded that PARRY
generate  words   letter-by-letter   to   account   for   alternative
spellings).  Since  generation  of natural language output represents
one of the major  shortcomings  of  the  model,  we  are  at  present
attempting  to  couple  the  generation more closely with the model's
theory.

CRITICISM #7:
	The  model,  even  if successful as a simulation, is useless.
Does it teach us anything about paranoia? Can  it  be  used  to  help
patients suffering from paranoid disorders?

REPLY:
	The model represents an attempt to make intelligible paranoid
processes   in   explicit   symbol-processing  terms.    A  model  of
psychopathology in which the mind is in error about some of  its  own
processes  has  implications  for  prevention, reduction, and cure of
disorder. PARRY  intersects  two  expert  communities  consisting  of
researchers  in artificial intelligence and clinicians in psychiatry.
Clinicians are practical men  who  are  interested  in  technological
applications.
	If  the  disorder  is  at  the  "hardware"  level  of   brain
pathology, then the application of symbol-processing techniques might
be of little use.   But if there is reason to believe the disorder is
at  the  program level of learned, acquired strategies, then attempts
at re-programming  through  symbolic-semantic  techniques  are  worth
considering.   At present clinicians have great difficulties treating
paranoid disorders. Often the treatment is limited  to  tranquilizing
drugs.  For  a  clinician  practicing  behavior  therapy, the model's
theory suggests desensitizing the patient to humiliation, a technique
which  has  been  successful  with  other  negative  affects  such as
anxiety.   For those practicing  psychotherapy,  the  model's  theory
suggests  exploring the topics of humiliation and self-censure in the
hope of helping the patient to reject his judgements  of  himself  as
inadequate.   Judging  whether  these  treatments are effective would
depend on clinical evaluations.
	A  practical  application  for  PARRY  lies  in  its use as a
training aid. Medical students in psychiatry,  students  in  clinical
psychology, and psychiatric residents can practice interviewing PARRY
for hours before they "practice" on human patients.  They  can  learn
what  sorts  of  input  expressions  upset  the  model  and  lead  to
witholding  of  information  or  breaking  off   the   doctor-patient
relationship.

CRITICISM #8:
	PARRY does  not  tell  us  what  is  the  cause  of  paranoid
thinking. Effective  treatment  requires  we  know  the  cause  of  a
disorder.

REPLY:
	PARRY  does  not account for how a system got to be that way;
it describes only  how  the  system  now  works.  An  ontogenetic  or
morphogenetic model would show how a normal system became that way as
a result of its experience over time.
	It is not true that to have  effective  treatments  one  must
know  the  cause  of  a disorder. Illnesses involve loops and circles
which, if broken anywhere, can lead to relief of  the  disorder  even
when  the  mechanism  of  action  of the treatment is not understood.
Common examples of successful treatments  for  illnesses  of  unknown
causes  are  insulin  in  diabetes,  digitalis  in  congestive  heart
failure, colchicine in gout, and lithium in mania.

CRITICISM #9:

	The  tests PARRY has passed are not severe enough. If a model
passes a validation test, it might not be because it is a good  model
but because the test is weak.

REPLY:
	Our strongest test involves  having  judges  rate  interviews
with  versions  of  the model and with paranoid patients.  We utilize
statistical measures to  see  how  closely  the  model's  performance
matches  that  of  the  patients and how much better it performs than
previous model-versions. A recent study showed that on the  dimension
of  linguistic  comprehension  independent  raters gave PARRY2 a mean
rating of 5.48 on a scale of 0-9. (Colby, Hilf, Wittner, Faught,  and
Parkison,  1974).  A previous version of PARRY received a mean rating
of 5.25. This improvement is significant at the 0.05% level. But  the
model  is  still  far  from the 7.42 rating received by the patients.
The rating groups ( psychiatrists and graduate students )  have  been
shown  to be reliable, i.e., there is agreement both within groups of
raters and between groups.
	Stronger  tests  are  certainly  needed, and we would welcome
suggestions along these lines. Are there validation tests others have
used  which might be suitable for PARRY? In the past most models have
relied on face validity. To improve  a  model  measureably,  we  need
better tests and statistical measures. One weakness of A-I as a field
is that many of its models have not been  sufficiently  subjected  to
empirical tests.

CRITICISM #10:

	PARRY is excessively crude, sketchy, and immature as a model.
Such theoretical models can be premature for a field and can turn out
to be irrelevant or counterproductive.   We should collect more  data
about    naturally-occurring   paranoia   before   attempting   model
construction.

REPLY:

	No  one really knows when to begin theorizing. Even facts are
now believed to be  heavily  theory-laden,  whether  their  collector
realizes  it  or  not.  One of the perils   of model building is that
data used to test a model may demolish it. A model is only sufficient
unto the day.
	If  PARRY  is  not  acceptable,  then  one accepts some rival
formulation  (  a   current   one   is   "paranoia   represents   the
transformation of love into hate"), or one accepts nothing and waits.
Waiting for perfection can be paralyzing to a field, especially  one
devoted to patients who need help.
	As  a  simplification,  PARRY  is  perhaps  too simple at the
moment. In constructing a model, one strives  for  something  simpler
than  the  "real" referent system which is difficult to understand or
manipulate.  But  one  wants  to  retain   the   important   features
characteristic  of  the  natural  counterpart.    If the model is too
simple, it is  unable  to  reproduce  these  important  features  and
extrapolation  to  the  natural referent system becomes risky. If the
model is too complex, it  becomes  as  difficult  to  understand  and
manipulate  as  the  real  thing.  Faced  with  this dilemma, a model
builder can improve his model by simplifying it  or  making  it  more
complicated while retaining consistency.


                   REFERENCES

Colby, K.M.
   1974. ARTIFICIAL PARANOIA: A Computer Simulation of Paranoid
         Processes. Pergamon, New York, (In Press).

Colby, K.M., Weber, S., and Hilf, F.D.
   1971. Artificial Paranoia. ARTIFICIAL INTELLIGENCE, 2, 1-25.

Colby, K.M., Hilf, F.D., Weber, S., and Kraemer, H.
   1972. Turing-like Indistinguishability Tests for the Validation
         of a Computer Simulation of Paranoid Processes. ARTIFICIAL
         INTELLIGENCE, 3, 199-221.

Colby, K.M. and Hilf, F.D.
   1974. Multidemensional Evaluation of a Computer Simulation of
         Paranoid Thought. In KNOWLEDGE AND COGNITION, Gregg, L.
         (Ed.), Lawrence Ehrlbaum and Associates, Potomac, Maryland.
         (Also appears as Stanford Artificial Intelligence Laboratory
         Memo AIM-194, Computer Science Department, Stanford
         University).

Colby, K.M., Parkison, R.C., and Faught, B.
   1974. Pattern-matching Rules for the Recognition of Natural
         Language Dialogue Expressions. AMERICAN JOURNAL OF
         COMPUTATIONAL LINGUISTICS. Vol 1, Microfiche 5.
         (Also appears as Stanford Artificial Intelligence
         Laboratory Memo AIM-234, Computer Science Department,
         Stanford University).

Colby, K.M., Hilf, F.D., Wittner, W.K., Faught, B., and Parkison, R.C.
   1974. Measuring the Improvment in Linguistic Comprehension in
         a Model of Paranoid Processes. (Forthcoming).